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This paper is intended to serve as an introduction to the use of the Kalman filter in modeling atomic 
clocks and obtaining maximum likelihood estimates of the model paramaters from data on an ensemble 
of clocks. Tests for the validity of the model and confidence intervals for the parameter estimates are discussed. 
Techniques for dealing with unequally spaced and partially or completely missing multivariate data are 
described. The existence of deterministic frequency drifts in clocks is established and estimates of the drifts 
are obtained. 
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1. Introduction 

The recursive updating of least squares estimates that would later become a special case of the Kalman "filter" 
was apparently known to Gauss [I], 1 and published in the modern statistical literature by Plackett [2]. Kalman 's 
contribution [3] was the generalization to dynamic systems. Kalman 's filter has found immense application in diverse 
areas of engineering. In classical applications,, recursive estimates are obtained of the "state" of the system but 
parameters appearing elsewhere in the state space representation for the system and the mathematical model for 
the system must be considered known. 

In our application, Kalman 's recursive equations allow us to compute the likelihood function for given values 
of parameters occurring anywhere in the state space representation. Nonlinear optimization techniques can then 
be used to find the maximum, likelihood estimates. The recursive residuals, or innovations, may be examined to 
judge the adequacy-of-fit of the model, and generalized likelihood ratio tests may be used to test the significance 
of model parameters such as frequency drifts and obtain confidence intervals for the estimated parameters. Kalman 
filter techniques make it quite easy to deal with unequally spaced and completely or partially missing multivariate 
data. We describe these methods in this paper. 

The next article in this issue, "Estimating Time from Atomic Clocks, " by Jones and Tryon [4] describes statistical 
procedures for detecting clock errors and allowing for the insertion or deletion of clocks in the ensemble during the 
parameter estimation process. It also describes the development of a time scale algorithm based on the model developed 
in this paper. 

2. Models For Atomic Clocks 

A cesium atomic clock is a feedback control device whose frequency locks onto the fundamental resonance of the 
cesium atom at (ideally) 9,192,631,770 Hz, which defines the second. Frequency bias due to fundamental 
instrumentation limitations and environmental effects are determined by calibration against primary frequency 
standards. Stochastic fluctuations arise from shot noise in the cesium beam and the probabilistic nature of quantum 



'Deceased. Dr. Tryon served with the Center for Applied Mathematics, National Engineering Laboratory, 
**Division of Biometrics, Box B-l 19, School of Medicine, University of Colorado, Denver, CO 80262. 
'Figures in brackets indicate the literature references at the end of this paper. 



mechanical transition rates which cause white noise in the frequency of the clock which is integrated into a random 
walk in time. In addition, empirical studies have demonstrated that frequency wanders independently as a random 
walk, introducing the integral of a random walk into time. 

Finally, there is considerable international debate over the inclusion of linear drift in frequency to account for 
break-in and aging. Such drifts could be strictly constant over a long time period, or allowed to change slowly as 
a third random walk. The methods developed in this paper provide, for the first time, valid statistical tests of these 
hypotheses. 

The proposed stochastic model for a clock's behavior is 



x(t) =*(t-l) + dU)y(t-l) + i 6 2 (t)wit-l) + e(t) 
y(f) = y<f-l) + d(tMt-l) + ri(t) 

wit) = wit-1) + ati) 



(2.1) 



where 

x[t) is the clock's deviation from "perfect" time at sample point t in nanoseconds; 

6(t) is the time interval in days between sample points t and t-l; 

y{t) is the clock's frequency deviation from perfect frequency at sample point t in nanoseconds/day; 

w{t) is the drift in frequency in nanoseconds/day 2 ; 

tit), r)it), and ait) are mutually independent white zero mean Gaussian random variables with standard deviations 

\fdit)o c , v/~o"(t)<v and \f6{t)o„, respectively, in nanoseconds, nanoseconds/day, and nanoseconds/day 2 . The 

\f<5{t) arises because the variance of a random walk is proportional to the time interval. 

If o„ = 0, wit) is a constant, to, representing a deterministic linear trend. If both o„ = and w = 0, the model 
is drift-free. The purpose of this study is to evaluate the validity of these models, and to obtain generalized likelihood 
ratio tests for the significance of the drift parameters and maximum likelihood estimates of the parameters o„ o„ 
and w <or o„, if appropriate), for each clock in the ensemble. 



3. The Kalman State Space Model for a Clock Ensemble 



The combination of clocks into a Kalman state space model will be explained by considering the special case of 
a three-clock model. The state transition equation is 

*,(() 
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where subscripts indicate the clocks. 
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In more compact form this is 



X{t) = *(t) Xit-1) + Uit) , t = 1, 2 



(3.1) 



where Xit) is the state vector and O(t) is the state transition matrix. The stochastic elements of the transition from 
one state to the next are contained in U{t). 

The clocks are never read directly, but, using sophisticated electronics, the time differences between clocks can 
be read with a precision of less than a nanosecond. "Perfect" time, which we can never observe, cancels out, leaving 
an observation of the clock error differences. z x {t ) = xM - x 2 U) and z 2 it) = x t it) - x 3 it)- We write the observation 
equation in matrix form as 
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Zit) = Hit) Kit) + Vit) , t = 1, 2, 



(3.2) 



Z(t) is the vector of observed clock time differences and Hit) is the observation matrix. Vit ) is a vector of observational 
errors. We write Hit) as a time-varying matrix because, in the multivariate case, partially missing data is represented 
by deleting rows from H(t) thus changing the dimension of Zit), Hit), and Vit). In some cases it is possible to recover 
from a problem with the reference clock by defining a new (temporary) reference and changing the patterns of + 
1 and zeros in Hit). This is described in detail in Jones and Tryon [4]. 

For small 6it) the covariance matrix of the transition errors, Uit), is of the diagonal form 6it)Q where Q is 



Q = 



On 



0,1 



o«i 



Orf' 



0,2 



0«2 



0.z 



0„3 



(3.3) 



Note that the matrix Q contains all the parameters to be estimated except w. 

If the time offset of each clock is measured to the nearest nanosecond and is otherwise error free, the observational 
error vector Vit ) has covariance matrix 



Rit) 



1/12 
1/12 



(3.4) 



since the variance of a uniformly distributed random variable from -0.5 to 0.5 is 1/12. The dimension of Rit) will 
also vary with time if there are partially missing data. 



4. The Kalman Recursion 

First, we adopt some notation. By X(t|s), s < t we denote the best estimate of the state vector at sample time 
t, based on all the data up to and including sample time s. We define P{t\s) to be the covariance matrix of XU|s). 
Similarly, Z(t\s) will be the predicted observation at time t based on all the data up to time s. 

The recursion proceeds through the following steps (Kalman [3]). 

1. Given the state vector and its covariance matrix at any sample time, say t, we can predict the state vector one 
observation interval into the future using the transition eq (3.1). The prediction is 

Xit+l\t) = 4>(t+l> Xit\t) (4.1) 

since the expected value of the stochastic component of the transition is zero. The covariance matrix of the prediction is 

Pit+l\t) = *(t+l) Pti\t) <t>'(t+l) + 6[t+l)Q, 14.2) 

where the first term on the right is the covariance matrix of the linear operation in the state transition and the second 
term is the covariance of the additive stochastic component. 

2. From the observation matrix and the predicted state we can predict the next observation. 

Z(t+l|f) = Hit+1) X{t+l\t). (4.3) 

3. We next compare the prediction with the actual data (when it becomes available in a real-time application). The 
difference is the recursive residual, or innovation, 

Jit+1) = Zit+l) - Zit+l\t), (4.4) 



which has covariance matrix 

Cit+l) = Hit+l) P(t+l\t) H'it+1) + Rit+1). (4.5) 

The residuals are critically important. Later, they will be used for checking the fit of the model and estimating 
the parameters. For now, we will use the information in the residual to complete the recursion. 

4. Let 

AU+U = Pit+l\t) H'lt+1) C-'it+l). (4.6) 

This is sometimes called the Kalman gain. 

Using the Kalman gain we can update the predicted state and its covariance 

X(H-l|t+l) = Xit+l\t) + AU+1) I{t+1) (4.7) 

and 

P(t+l\t+l) = P(t+l\t) - AU+1) Hit+l) Pit+l\t). (4.8) 

This completes the recursion. 



5. Maximum Likelihood Modeling: and Estimation 

In this section we will compare the classic use of the Kalman filter with our method of application. In the classical 
application, unknown parameters to be estimated can appear only in the state vector. All other quantities which 
appear in the recursion must be known. A single pass of the recursion through the data (often in real time) provides 
the latest estimate of the state vector and its covarianee at each point in time. This is exactly how the Kalman filter 
is applied to forming a time scale. Given the model and estimates of the parameters, the recursion gives the updated 
state vector after each new observation. The first, fourth, and seventh, etc., elements of the state vector are estimates 
of the clock errors, from which, along with their uncertainties from the state covarianee matrix,, we can estimate time. 

Our interest, however, is in evaluating how well the clock model fits real data, and in estimating the parameters. 
To do this, we make use of two facts concerning the residuals, I{t) y t = 1, 2, ..., N. 

1. If the model is correct, and the recursion is run through the data with the correct values of the unknown parameters, 
the residuals will form Gaussian white noise series. Standard statistical tests can then be used to determine if the 
residuals are in fact Gnassian white noise. If not, the structure of the residuals can often give some clues as to the 
defects in the model. 

2. From the residuals and their covarianee matrix, obtained by running the recursion with any fixed values of the 
unknown parameters, the likelihood function for those parameter vahies can be computed. Specifically, the -2 In 
likelihood function (ignoring the additive constant! is 

L = X [In \Cit) | + I'M C-Mi) /**>]. (5.1) 

t 

The maximum likelihood estimates of the parameters are those for which L is minimized. Starting from initial 
guesses, we can use iterative nonlinear function minimization methods to find the estimates. Evaluating the function 
L for new trial values of the parameters requires one pass of the Kalman recursion through the data. Note that 
the parameters to be estimated may appear anywhere in the structure of the Kalman recursion. 



6. Computational Procedures 

It is not necessary to invert the residual covarianee matrix, C{t), when computing the likelihood function or updating 
the predicted state vector and its covarianee. If the upper triangular portion of C(t+l) is augmented by 
H{t+l)Pti+l\t), and I(t+l) to form the partitioned matrix 

[Cit+D | Hit+1) Pit+l\t) | 7(t+l>], (6.1) 

a single call to a Choleskey factorization routine (Graybill [5], p. 232) which factors 

Cit+1) = T'it+1) Tit+1), (6.2) 

where jRf+1) is upper triangular, will replace the partitioned matrix by 

[Tit+D | [T'it+DY 1 Hit+1) Pit+l\t) 1 [T'U+D]- 1 Iit+D). (6.3) 

If we call the parts of the new partitioned matrix 

[Tit+D | Bit+1) \Dit+D], (6.4) 

the updated state and state covarianee matrices become 



X(t+1|H-1» = X(t+l\t) + B'(t+1) D(t+\) (6.5) 

and 

P(t+1|H-D = P(H-l|t) - B'it+1) Bit+ll (6.6) 

The -2 In likelihood function becomes 

Z, = 2 (Z In i>Jt) + D'it) D(t)) (6.7) 

t i 

where t u {t) are the diagonal elements of Tit). 

Missing data are easily handled by the Kalman filter procedure. However, there are two distinct situations. First, 
all the data may be missing at one or several successive observation times, and second, occurring only in multivariate 
data, only part of the data may be available on a given occasion. For example, one or more, but not all, of the 
clock differences are missing on that occasion. 

The first case is most easily dealt with. In the present model, the unequal spacing feature can account for any 
size gap in the data by proper choice of d(r). This, however, is a feature of random walk based models and may 
not apply to other models. A more general procedure for missing data is the following: Suppose k ^ 1 successive 
observations are missing. Simply repeat step one in the recursion k additional times to obtain the predicted state, 
X(t+k+l \t), and its covariance, Pit+k+1 \t), for the next observation time for which there is data. Then proceed 
with the remainder of the recursion. There is no residual and no contribution to the likelihood function when there 
is no data (Jones [6]). 

In the second case, partial loss of data, i.e., missing elements of the data vector Z(t), can be represented by eliminating 
the corresponding rows of the observation matrix, Hit). Similarly, rows of the observational error vector V(t) and 
rows and columns of the error covariance matrix R{t) causes no problem with the remaining steps of the recursion, 
which proceed normally. The information available in the data is propagated through the recursion to correctly update 
the state vector prediction and its estimated covariance matrix. 

The matrix operations in the recursion are, for reasons of efficiency, best coded to take advantage of the specific 
structure (many zeros) of each matrix. Using available general purpose subroutines to perform the required matrix 
operations would increase the cost of computer resources enormously. These calculations can be adapted to a 
time-varying Hit). One approach is to use indicator variables for missing data to form the various reduced matrices 
directly. Another approach is to note that pre- and/or post-multiplying by Hit) with rows missing is equivalent to 
performing the operation with the original H matrix and then eliminating the appropriate rows and/or columns of 
the product matrix and closing it up. For example, the residual covariance matrix 

Cit+1) = Hit+1) Pit+l\t) H'it+1) + Rit+1) 

with, say, the second and fourth elements of the data vector missing can be obtained by using the original H and 
R but then eliminating the second and fourth rows and columns from Cit+l). It is straightforward to write general 
purpose subroutines to eliminate rows and/or columns of a matrix and close up the spaces. Again, this can be done 
in response to indicator variables denoting missing observations. 

To start the recursion the initial state X(0|0) and its covariance P(0|0) are needed. Both depend on the application. 
The time states could be set to zero and the clocks set to agree with International Atomic Time as defined by the 
BIH, The corresponding variances on the diagonal of P(0|0) would be determined by the accuracy of the 
intercomparison. If a strictly internal application such as parameter estimation or a time interval measurement was 
desired, the time states could be set to agree with the difference readings at t = 0, with observational error variances 
set to the appropriate value for roundoff error. The frequency states might be measured by intercomparison with 
the frequency standards and the variance determined from the measurement uncertainty. When a measurement is 
not possible a best guess at the frequency offset can be used with a corresponding guess at its uncertainty used for 
the variance. 



7. Data Analysis 

The data used for this analysis were collected over a period of 333 days beginning on February 16, 1979, during 
which seven clocks listed below were in essentially uninterrupted normal operation. Clock differences were recorded 
daily to the nearest nanosecond. 



Clock Number 



Identification 



601 (Reference) 
167 
137 
1316 
323 
324 
8 



HP model S061A option 004 serial #601 

HP model 5060A serial #167 

HP model 5060A serial #137 

HP model 5061A option 004 serial #1316 

HP model 5061A 004 serial #323 

HP model 5061A 004 serial #324 

Frequency and Time System model 400 serial 



In addition to occasional variability in the time of data collection, two successive days were missing entirely. On 
three other occasions, single clocks had obvious read errors which when discarded resulted in partially missing data. 
Several dozen other observations which had been flagged as possible errors by the ad hoc procedure then in use 
were examined visually, judged to be not serious, and retained in the analysis. 

Table 1 gives the results of the computations. First we note that the inclusion of deterministic trends (model II) 
results in a drop of 41 in the -2 In likelihood function value over the no-trend model (model I). The reduction is 
distributed as X 2 (6) under the hypothesis of model I, so this is highly significant. There is less than one chance in 
a thousand that this is a spurious result. Three of the seven clocks had significant trends. The inclusion of random 
walk drifts (model III), however, makes no improvement. Furthermore, the estimated values for the o a 's are all 
approximately zero and the estimated initial values of the drift, u?(0), are about the same as the deterministic drifts. 
The conclusion is that several frequency trends are present in this data and that they are constant over the year, 
rather than wandering as a random walk. It should be noted that one of the drifting clocks had just begun operation 
while the remaining two were nearing their end of life and failed shortly after the test period. The remaining clocks 
were in their "middle years." The standard errors of the parameter estimates given in the table are obtained from 
the estimated covariance matrix as approximated by twice the inverse of the Hessian of the -2 In likeliho^' 1 function, L. 

In model II it is necessary to constrain the solution so that the sum of the drifts is zero. This means that we cannot 
detect a common drift from clock difference readings. Because of this constraint, a standard error for clock #8 is 
not available. Due to the large cost of computing the 28 x 28 Hessian for model III, which we have rejected anyway, 
the standard errors of the estimates have not been computed. 

Figures la-f are integrated periodograms of the six series of residuals. The parallel lines represent five percent 
significance test limits for white noise. That is, only five percent of actual white noise series so tested will wander 
outside the lines. In this data set, only one pair of clocks (601 minus 167) two) show a slight deviation from white 
noise. We conclude that this model fits very well. Figures 2a-f are histograms of the six residual series. Also given 
are three different tests for normality. All the series are well approximated by the normal (Gaussian) distribution 
which verifies that the assumed likelihood function was reasonable. 

We note in conclusion that, having selected the constant drift model, we could reduce the dimension of the state 
space model by one-third by rewriting the clock model as 
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The state prediction in the first step of the recursion then becomes 

X(t+l\t) = *(H-1) X(t\t) + Ait+1) 



where 



Ait) = 



6 2 {t) wx 
2 

6(t)u>i 

d" 2 (r) w 2 , 



2 
6{t)w 2 



and Kit) and <t>U) are redefined from the clock model in the obvious way. This model would be suitable for estimation 
purposes but not foi a time scale algorithm. With the drift in the state vector we can enter the uncertainty of its 
estimation in the initial state covariance matrix. It is then propagated through to the uncertainty in time. The simplified 
model above assumes w is known without error. 

TABLE 1. Results of data analysis. 





Model I 




No Trend 




L = 10609.1 


Clock 




Number 


Param. Est.* S.E 


601 


b, = 7.42 .32 




ol = .86 .24 



167 3, = 13.45 .56 

b, = 1.15 .39 



137 6. = 10.03 .45 

3, = 1.71 .36 



1316 



o. 



3.61 .24 
1.29 .24 



323 b. 



3.27 .24 
1.54 .21 



324 5, 



3.30 .25 
1.42 .23 



8 b. 

b„ 



9.08 .43 
2.68 .39 



Model II 
Deterministic Trend 

L = 10567.8 



Param. Est.* S.E. 



o, 
w 



= 7.46 .32 
= .44 .26 
= .152 .038 



13.45 

1.11 
.052 



.56 
.36 
.061 



o, 
5, 

w 



10.04 
1.60 

.179 



.45 
.36 
.081 



w 



3.62 
1.36 
-.017 



.25 
.24 
.070 



a. 

HI 



3.53 .22 

.73 .20 

-.313 .046 



o, 

w 



3.30 .25 
1.40 .22 
.035 .072 



w 



9.09 .43 
2.65 .39 
-.088 - 



Model III 
Random Walk Drift 

L = 10567.7 



Param. Est.* 



o, = 

w\ = 



7.47 
.48 
.152 



= 4.1x10"' 



Oc 



«<0} = 



13.46 
1.07 

.054 



= 3.7x10"' 



a, 

b„ 



= 10.06 

= 1.57 

= .178 

= 2.2x10" 



a, 
w(0> 



= 3.59 
= 1.38 
= -.015 

= 1.3*10^ 



o. 

K 

w 

b. 



3.52 
.70 
-.314 
4.9xl0" ; 



&<0> 
b. 



= 3.30 

= 1.41 
= .035 

= 7.9xl0>" ! 



b. 



9.09 
2.66 
-.090 



= 3.9x10" 



,-s- 



*Units are in nanoseconds. 
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8. Conclusion 

The Kalman filter is a powerful tool for maximum likelihood model fitting and parameter estimation in time series 
analysis. We have established the validity of the proposed models for clock behavior and obtained precise estimates 
of clock parameters. The existence of deterministic frequency drifts over a period of a year has been demonstrated 
in clocks at the beginning and end of their life span. 

The primary disadvantage of these methods is the cost of function evaluation in the nonlinear optimization process. 
The analysis of a year of data on seven clocks takes 10 minutes on a CDC Cyber 170/750 computer and 10 hours 
on the DEC 11/70. Use of state-of-the-art optimization codes is essential. We use an optimization package especially 
designed for maximum likelihood applications. It is adapted from the package ■written by Weiss [7], based on algorithms 
by Dennis and Schnabel [8]. The code used on the CDC Cyber 170/750 is part of a preliminary version of the National 
Bureau of Standard's STARPAC library [9]. 
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Engineering Laboratory. 



9. References 

[1] Gauss C. F. Theoria combinationis observationum erroribus minimis obnoxiae. Werke, 4. Gottingen. 1821. 

(Collected works 1873) 
[2] Plackett, R. L. Some theorems in least squares. Biometrika, 19: 149-157; 1950. 
[3] Kalman, R. E. A new approach to linear filtering and prediction problems. ASME Transactions, Part D, Journal 

of Basic Engineering, 82: 35-45; 1960. 
[4] Jones, R. H.; Tryon, P. V. Estimating time from atomic clocks. Second Symposium on Atomic Time Scale 

Algorithms, June 23-25, 1982, Boulder, CO. 
[5] Graybill, F. A. Theory and application of the linear model. North Scituate: Duxbury Press; 1976. 
[6] Jones, R. H. Maximum likelihood fitting of ARMA models to time series with missing observations. Technometics, 

22: 389-395; 1980. 
[7] Weiss, B. E. A modular software package for solving unconstrained nonlinear optimization problems. Master's 

thesis, University of Colorado at Boulder; 1980. 
[8] Dennis, J. E.; Schnabel, R. B. Numerical methods for unconstrained optimization and nonlinear equations. 

New Jersey: Prentice Hall; 1982. 
[9] Donaldson, J. R.; Tryon, P. V. STARPAC: The standards time series and regression package. 

Unpublished — Documentation available from Janet P. Donaldson, NBS-714, 325 Broadway, Boulder, CO 

80303. 



Figures la-f and 2a-f follow. 
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FIGURE la. Residuals from clock 601 minus clock 167. 
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FIGURE lb. Residuals from clock 601 minus clock 137. 
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Figure Ic. Residuals from clock 601 minus clock 1316, 
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FIGURE 2a. Histogram — residuals from clock 601 minus clock 167. 
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FIGURE 2b. Histogram — residuals from clock 601 minus clock 137. 
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Figure 2c. Histogram — residuals from clock 601 minus clock 1316. 
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FlGUKE 2d. Histogram — residuals from clock 601 minus clock 323. 
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FIGURE 2e. Histogram — residuals from clock 601 minus clock 324. 
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FIGURE 2f. Histogram — residuals from clock 601 minus clock 8. 
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